We illustrate the methods described in the seminar about LMMs and GLMMs.

LMMs

The PISA data

We start by loading the required packages, reading the PISA data and doing some simple transformations. For the interpretation of the variables we refer to the “PISA_readme.xls” file: here we report a screenshot.

library(nlme)
library(MASS)
library(lattice)
pisa <- read.table(file = "data/pisaSEL.txt", header = TRUE) 
pisa$CESCS <- pisa$ESCS - pisa$ESCSM
pisa$SEX <- factor(pisa$SEX)
pisa$CHANGE <- factor(pisa$CHANGE)
summary(pisa)
##     SCHOOLID         SEX           KIND           GRADE            MATH    
##  Min.   :14006   Female:631   Min.   :0.000   Min.   :0.000   Min.   :228  
##  1st Qu.:14015   Male  :638   1st Qu.:1.000   1st Qu.:1.000   1st Qu.:456  
##  Median :15010                Median :1.000   Median :1.000   Median :509  
##  Mean   :15055                Mean   :0.971   Mean   :0.853   Mean   :510  
##  3rd Qu.:16002                3rd Qu.:1.000   3rd Qu.:1.000   3rd Qu.:566  
##  Max.   :16012                Max.   :1.000   Max.   :1.000   Max.   :807  
##                                                                            
##       ESCS            CHANGE        FAMSTRUC          READ          TECH      
##  Min.   :-2.731   Invalid:   1   Min.   :0.000   Min.   :155   Min.   :0.000  
##  1st Qu.:-0.832   Miss   :   4   1st Qu.:1.000   1st Qu.:460   1st Qu.:0.000  
##  Median :-0.185   N/A    :   9   Median :1.000   Median :522   Median :0.000  
##  Mean   :-0.131   No     :  70   Mean   :0.807   Mean   :513   Mean   :0.458  
##  3rd Qu.: 0.461   Yes    :1185   3rd Qu.:1.000   3rd Qu.:570   3rd Qu.:1.000  
##  Max.   : 2.343                  Max.   :1.000   Max.   :778   Max.   :1.000  
##                                                                               
##      LYCEUM          EARLY           ESCSM            SCHLSIZE   
##  Min.   :0.000   Min.   :0.000   Min.   :-0.9083   Min.   : 135  
##  1st Qu.:0.000   1st Qu.:0.000   1st Qu.:-0.4744   1st Qu.: 424  
##  Median :0.000   Median :0.000   Median :-0.2253   Median : 676  
##  Mean   :0.252   Mean   :0.037   Mean   :-0.1311   Mean   : 698  
##  3rd Qu.:1.000   3rd Qu.:0.000   3rd Qu.: 0.0679   3rd Qu.: 845  
##  Max.   :1.000   Max.   :1.000   Max.   : 1.2104   Max.   :1568  
##                                                                  
##     SCMATEDU         SCMATBUI         PROPSTAY         ISTMED     
##  Min.   :-2.125   Min.   :-2.310   Min.   : 2.00   Min.   :0.000  
##  1st Qu.:-0.527   1st Qu.:-0.432   1st Qu.: 4.00   1st Qu.:0.000  
##  Median : 0.265   Median :-0.137   Median : 7.00   Median :0.000  
##  Mean   : 0.275   Mean   :-0.220   Mean   : 8.07   Mean   :0.453  
##  3rd Qu.: 1.113   3rd Qu.: 0.146   3rd Qu.:10.00   3rd Qu.:1.000  
##  Max.   : 2.200   Max.   : 1.488   Max.   :24.00   Max.   :1.000  
##                                    NA's   :27                     
##      ISTLAR        MEDDISC           MEDSTUR           PCGMED     
##  Min.   :0.00   Min.   :-1.1389   Min.   :-0.954   Min.   :0.000  
##  1st Qu.:0.00   1st Qu.:-0.5453   1st Qu.:-0.627   1st Qu.:0.000  
##  Median :0.00   Median :-0.0547   Median :-0.430   Median :0.000  
##  Mean   :0.43   Mean   :-0.1592   Mean   :-0.458   Mean   :0.277  
##  3rd Qu.:1.00   3rd Qu.: 0.1688   3rd Qu.:-0.222   3rd Qu.:1.000  
##  Max.   :1.00   Max.   : 0.7508   Max.   : 0.621   Max.   :1.000  
##                                                                   
##      PCGALT          CESCS        
##  Min.   :0.000   Min.   :-2.9638  
##  1st Qu.:0.000   1st Qu.:-0.5438  
##  Median :0.000   Median :-0.0127  
##  Mean   :0.403   Mean   : 0.0000  
##  3rd Qu.:1.000   3rd Qu.: 0.5183  
##  Max.   :1.000   Max.   : 2.4979  
## 

The first step is to declare the data frame to be a grouped data object, and then obtain the size of each group.

pisa <- groupedData(MATH ~ ESCS | SCHOOLID, data = pisa)
table(getGroups(pisa))    
## 
## 16010 15008 16012 16006 16003 16002 15314 16001 15003 16008 15013 16007 15005 
##    30    35    27    29    31    30    34    34    31    32    30    28    30 
## 16011 14013 14008 15006 15002 15016 14012 16005 16004 15004 14010 14011 15018 
##    31    31    30    35    33    30    33    34    31    34    30    32    32 
## 16009 15001 15015 15012 15010 14009 15007 15011 14014 15009 14006 15017 14015 
##    31    31    35    31    30    31    35    35    34    30    33    30    33 
## 14007 
##    33

We then perform some explorative analyses, starting from a simple scatterplot, that suggests a strongly linear relationship between MATH and ESCS.

plot(MATH ~ ESCS, pisa, pch = 16) 
lines(lowess(pisa$ESCS, pisa$MATH), col = 2, lwd = 2)
abline(lm(MATH ~ ESCS, pisa), col = 4, lwd = 2)

Similar results are obtained for READ vs ESCS and for the two student scores.

plot(READ ~ ESCS, pisa, pch = 16)
lines(lowess(pisa$ESCS, pisa$READ), col = 2, lwd = 2)
abline(lm(READ ~ ESCS, pisa), col = 4, lwd = 2)
plot(MATH ~ READ, pisa, pch = 16)
lines(lowess(pisa$READ, pisa$MATH), col = 2, lwd = 2)
abline(lm(MATH ~ READ, pisa), col = 4, lwd = 2)

More apt visualizations are obtained by taking into account the hierarchical structure of data. Some heterogeneity across schools is apparent.

xyplot(MATH ~ ESCS | SCHOOLID,  data = pisa,  main = "MATH",
       panel = function(x, y){ 
         panel.xyplot(x, y)
         panel.loess(x, y)
         panel.lmline(x, y, lty = 2) } )

We repeat the same plot for the relationships between the two student scores.

xyplot(READ  ~ MATH | SCHOOLID,  data = pisa, 
       main = "READ vs MATH",
       panel=function(x, y){ 
         panel.xyplot(x, y)
         panel.loess(x, y)
         panel.lmline(x, y, lty = 2) } )

The two scores vary strongly across the different type of school.

par(mfrow = c(1, 2), pty = "s", mar = rep(3, 4))
boxplot(MATH ~ LYCEUM * TECH, pisa, 
        names = c("Other", "Lyceum", "Tech", ""), 
        xlim = c(0.5, 3.5), xlab = "School type")
boxplot(READ ~ LYCEUM * TECH, pisa, 
        names = c("Other", "Lyceum", "Tech", ""), 
        xlim = c(0.5, 3.5),  xlab = "School type")

Therefore, it makes sense to break down the relationship between the math score and the socio-economic score for type of school:

xyplot(MATH ~ ESCS | SCHOOLID, data = subset(pisa, LYCEUM == TRUE), 
       main = "LYCEUM",
       panel = function(x, y){ 
         panel.xyplot(x, y)
         panel.loess(x, y)
         panel.lmline(x, y, lty = 2) } )
xyplot(MATH ~ ESCS | SCHOOLID, data = subset(pisa, LYCEUM == FALSE), 
       main = "NOT LYCEUM",
        panel = function(x, y){ 
          panel.xyplot(x, y)
          panel.loess(x, y)
          panel.lmline(x, y, lty = 2) } )

With hierarchical data, we need to be aware that marginalizing with respect to the grouping factor may alter the relationship existing between two variables. For example, the relationship between the school-level means of MATH and ESCS appears nonlinear, in striking contrast with the student-level relationship.

MATH_schoolmean <- with(pisa, tapply(MATH, SCHOOLID, mean))
ESCS_schoolmean <- with(pisa, tapply(ESCS, SCHOOLID, mean))
plot(MATH_schoolmean ~ ESCS_schoolmean)
lines(lowess(ESCS_schoolmean, MATH_schoolmean), col = 2, lwd = 2)

Spurious nonlinearity is also induced for the READ score.

READ_schoolmean <- with(pisa, tapply(READ, SCHOOLID, mean))
plot(READ_schoolmean ~ ESCS_schoolmean)
lines(lowess(ESCS_schoolmean, READ_schoolmean), col = 2, lwd = 2)

Starting from LMs

We start by fitting a very simple linear model, with no school effect. Standard diagnostic plots do not show any apparent issue.

math.lm <- lm(MATH ~ ESCS, data = pisa)
summary(math.lm)
## 
## Call:
## lm(formula = MATH ~ ESCS, data = pisa)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -281.73  -53.77   -1.26   55.68  292.67 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   512.23       2.30  222.89  < 2e-16 ***
## ESCS           19.65       2.42    8.12  1.1e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 81.1 on 1267 degrees of freedom
## Multiple R-squared:  0.0494, Adjusted R-squared:  0.0487 
## F-statistic: 65.9 on 1 and 1267 DF,  p-value: 1.11e-15
par(mfrow = c(2, 3), pty = "s", mar = rep(3, 4))
plot(math.lm, which = 1:6)

However, the model residuals are strongly clustered by school, a clear symptom of the necessity to include a school effect. This is also confirmed by a standard F-test.

bwplot(getGroups(pisa) ~ resid(math.lm))
math.lm <- update(math.lm, . ~ . + factor(SCHOOLID))
bwplot(getGroups(pisa) ~ resid(math.lm))
anova(math.lm)

Should we include a school-specific slope for ESCS? The F-test does not suggest so.

math.lm <- update(math.lm, . ~ . + factor(SCHOOLID) * ESCS)
anova(math.lm) 

A useful graphical display can be obtained by the lmList function, which fits a school-specific model. However, the function fails if any variable has missing values, so we select the columns with no missing values. The output displays group-specific confidence intervals for estimated intercepts and slopes. The variability of the former is clearly larger.

pisa_nas <- colSums(is.na(pisa))
math.list <- lmList(MATH ~ ESCS | SCHOOLID, pisa[,  names(pisa)[pisa_nas==0]])
plot(intervals(math.list))

First mixed models

We start from a null multilevel model. The model is fitted by the lme function.

math.lme.null <- lme(MATH ~ 1, random = ~ 1 | SCHOOLID, data = pisa)
summary(math.lme.null)
## Linear mixed-effects model fit by REML
##   Data: pisa 
##     AIC   BIC logLik
##   14451 14466  -7222
## 
## Random effects:
##  Formula: ~1 | SCHOOLID
##         (Intercept) Residual
## StdDev:       47.71     68.8
## 
## Fixed effects:  MATH ~ 1 
##             Value Std.Error   DF t-value p-value
## (Intercept) 508.6     7.788 1229   65.31       0
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -3.4287735 -0.6631139 -0.0006406  0.6524297  3.6343748 
## 
## Number of Observations: 1269
## Number of Groups: 40

Comparing the fit to that of the LM clearly shows the shrinkage effect

math.list <- lmList(MATH ~ 1 | SCHOOLID, pisa[,  names(pisa)[pisa_nas==0]])
comp.math <- compareFits(coef(math.list), coef(math.lme.null))
plot(comp.math, mark = fixef(math.lme.null))

We can add some predictors.

math.lme <- lme(MATH ~ ESCS + KIND + SEX + GRADE, random = ~ 1 | SCHOOLID, data = pisa) 
summary(math.lme)
## Linear mixed-effects model fit by REML
##   Data: pisa 
##     AIC   BIC logLik
##   14375 14411  -7180
## 
## Random effects:
##  Formula: ~1 | SCHOOLID
##         (Intercept) Residual
## StdDev:       44.39    67.31
## 
## Fixed effects:  MATH ~ ESCS + KIND + SEX + GRADE 
##             Value Std.Error   DF t-value p-value
## (Intercept) 466.4    14.423 1225   32.33  0.0000
## ESCS          5.2     2.378 1225    2.18  0.0291
## KIND          1.3    11.598 1225    0.11  0.9103
## SEXMale      26.1     5.106 1225    5.10  0.0000
## GRADE        33.6     5.626 1225    5.97  0.0000
##  Correlation: 
##         (Intr) ESCS   KIND   SEXMal
## ESCS     0.109                     
## KIND    -0.768 -0.070              
## SEXMale -0.206 -0.127 -0.014       
## GRADE   -0.324 -0.026 -0.035  0.105
## 
## Standardized Within-Group Residuals:
##      Min       Q1      Med       Q3      Max 
## -3.63601 -0.65492 -0.00788  0.62557  3.45825 
## 
## Number of Observations: 1269
## Number of Groups: 40

By default, the model is estimated by REML, but we can switch to Maximum Likelihood Estimation.

math.lme.ML <-  lme(MATH ~ ESCS + KIND + SEX + GRADE, random = ~ 1 | SCHOOLID, 
                    data = pisa, method = "ML")
summary(math.lme.ML)
## Linear mixed-effects model fit by maximum likelihood
##   Data: pisa 
##     AIC   BIC logLik
##   14401 14437  -7194
## 
## Random effects:
##  Formula: ~1 | SCHOOLID
##         (Intercept) Residual
## StdDev:       43.72    67.21
## 
## Fixed effects:  MATH ~ ESCS + KIND + SEX + GRADE 
##             Value Std.Error   DF t-value p-value
## (Intercept) 466.3    14.382 1225   32.43  0.0000
## ESCS          5.2     2.378 1225    2.20  0.0280
## KIND          1.3    11.602 1225    0.11  0.9101
## SEXMale      26.0     5.104 1225    5.10  0.0000
## GRADE        33.6     5.628 1225    5.97  0.0000
##  Correlation: 
##         (Intr) ESCS   KIND   SEXMal
## ESCS     0.109                     
## KIND    -0.770 -0.070              
## SEXMale -0.206 -0.127 -0.014       
## GRADE   -0.325 -0.026 -0.035  0.105
## 
## Standardized Within-Group Residuals:
##     Min      Q1     Med      Q3     Max 
## -3.6409 -0.6549 -0.0096  0.6275  3.4638 
## 
## Number of Observations: 1269
## Number of Groups: 40

Confidence intervals for model parameters are readily computed.

intervals(math.lme)
## Approximate 95% confidence intervals
## 
##  Fixed effects:
##                lower    est.  upper
## (Intercept) 438.0671 466.364 494.66
## ESCS          0.5294   5.195   9.86
## KIND        -21.4472   1.307  24.06
## SEXMale      16.0331  26.051  36.07
## GRADE        22.5423  33.580  44.62
## 
##  Random Effects:
##   Level: SCHOOLID 
##                 lower  est. upper
## sd((Intercept)) 34.87 44.39 56.52
## 
##  Within-group standard error:
## lower  est. upper 
## 64.70 67.31 70.03

Model components can also be extracted.

fixef(math.lme)
## (Intercept)        ESCS        KIND     SEXMale       GRADE 
##     466.364       5.195       1.307      26.051      33.580
ranef(math.lme)
coef(math.lme)
VarCorr(math.lme)
## SCHOOLID = pdLogChol(1) 
##             Variance StdDev
## (Intercept) 1971     44.39 
## Residual    4531     67.31

Two kinds of residuals can be obtained: subject-specific residuals (level = 1)

r1 <- resid(math.lme, level = 1, type = "p")
bwplot(getGroups(math.lme) ~ r1)   

and population residuals (level = 0).

r0 <- resid(math.lme, level = 0, type = "p")
bwplot(getGroups(math.lme) ~ r0)  

We can also visualize some diagnostic plots:

plot(math.lme, MATH ~ fitted(.))
plot(math.lme, resid(.) ~ ESCS, abline = 0)
plot(math.lme, resid(., type = "p") ~ fitted(.))
qqnorm(math.lme, ~ resid(.))  
qqnorm(math.lme, ~ resid(.) | SEX)  
qqnorm(math.lme, ~ ranef(.) | LYCEUM)

Significance of variance components

For testing the nullity of the random intercept variance, we apply two methods. First, a Wald test with the right asymptotic distribution

V <- as.matrix(math.lme$apVar)
sigma <- exp(attr(math.lme$apVar, "Pars")[1])
se.lsigma <- sqrt(V[1, 1]) 
se.sigma <- se.lsigma * sigma
1 - pnorm(sigma / se.sigma)  
## reStruct.SCHOOLID 
##          2.22e-16

Then we consider testing the nullity of a random slope, a more challenging problem. To this end, we simulate the null distribution of the LRT; here we generate just a few samples, actual testing would require many more.

math.slope <- update(math.lme, random = ~ ESCS | SCHOOLID)
w.oss <- 2 * (logLik(math.slope, REML = TRUE) - logLik(math.lme, REML = TRUE)) 
ogg.sim <- simulate.lme(math.lme, data = pisa, nsim = 10,
                      seed = 1988, m2 = list(random = ~ ESCS | SCHOOLID))
w.sim <- 2 * (ogg.sim$alt$REML[, 2] - ogg.sim$null$REML[, 2])

Further issues

First, we verify the equivalence between a random intercept models and a marginal model with compound symmetry variance matrix.

math.marg <- gls(MATH ~ ESCS + KIND + SEX + GRADE ,data = pisa,
                 method = "REML", correlation = corCompSymm(form = ~ 1 | SCHOOLID))
cbind(math.marg$coef, fixef(math.lme))
##                [,1]    [,2]
## (Intercept) 466.364 466.364
## ESCS          5.195   5.195
## KIND          1.307   1.307
## SEXMale      26.051  26.051
## GRADE        33.580  33.580

Another useful twist is how to trick lme to fit a bivariate response model.

pisa2 <- rbind(pisa, pisa)
pisa2$Y <- c(pisa$MATH, pisa$READ)
pisa2$STUD <- c(1:nrow(pisa), 1:nrow(pisa))
pisa2$TYPOR <- factor(c(rep("math", nrow(pisa)), rep("read", nrow(pisa))))
mod.biv <-lme(Y ~ ESCS * TYPOR - 1 - ESCS, weights = varIdent(form = ~ 1 | TYPOR), 
             data = pisa2, correlation = corCompSymm(form = ~ 1 | SCHOOLID / STUD),
             random = ~ TYPOR - 1 | SCHOOLID)
summary(mod.biv)
## Linear mixed-effects model fit by REML
##   Data: pisa2 
##     AIC   BIC logLik
##   28504 28562 -14242
## 
## Random effects:
##  Formula: ~TYPOR - 1 | SCHOOLID
##  Structure: General positive-definite, Log-Cholesky parametrization
##           StdDev Corr  
## TYPORmath 46.07  TYPORm
## TYPORread 50.10  0.913 
## Residual  68.68        
## 
## Correlation Structure: Compound symmetry
##  Formula: ~1 | SCHOOLID/STUD 
##  Parameter estimate(s):
##    Rho 
## 0.4609 
## Variance function:
##  Structure: Different standard deviations per stratum
##  Formula: ~1 | TYPOR 
##  Parameter estimates:
##   math   read 
## 1.0000 0.9858 
## Fixed effects:  Y ~ ESCS * TYPOR - 1 - ESCS 
##                Value Std.Error   DF t-value p-value
## TYPORmath      509.4     7.544 2495   67.52  0.0000
## TYPORread      512.2     8.154 2495   62.82  0.0000
## ESCS:TYPORmath   5.9     2.372 2495    2.47  0.0135
## ESCS:TYPORread   3.8     2.356 2495    1.61  0.1078
##  Correlation: 
##                TYPORm TYPORr ESCS:TYPORm
## TYPORread      0.885                    
## ESCS:TYPORmath 0.043  0.020             
## ESCS:TYPORread 0.022  0.040  0.496      
## 
## Standardized Within-Group Residuals:
##      Min       Q1      Med       Q3      Max 
## -3.55792 -0.64748  0.02186  0.65977  3.60312 
## 
## Number of Observations: 2538
## Number of Groups: 40
intervals(mod.biv)
## Approximate 95% confidence intervals
## 
##  Fixed effects:
##                   lower    est.   upper
## TYPORmath      494.5937 509.386 524.179
## TYPORread      496.1931 512.182 528.171
## ESCS:TYPORmath   1.2147   5.867  10.519
## ESCS:TYPORread  -0.8296   3.791   8.412
## 
##  Random Effects:
##   Level: SCHOOLID 
##                            lower    est.   upper
## sd(TYPORmath)            36.2618 46.0737 58.5407
## sd(TYPORread)            39.5131 50.1001 63.5236
## cor(TYPORmath,TYPORread)  0.8193  0.9132  0.9594
## 
##  Correlation structure:
##      lower   est.  upper
## Rho 0.4158 0.4609 0.5039
## 
##  Variance function:
##       lower   est. upper
## read 0.9367 0.9858 1.037
## 
##  Within-group standard error:
## lower  est. upper 
## 65.99 68.68 71.48

Model building

We consider the effect of further covariates for the model with the MATH score as response, and perform some model building. As a first step, we separate the “between” and “within” effect of ESCS: this done by introducing the school mean among the predictor, and centering the ESCS by the school mean within each school (math.lme2 model). Note that we re-define the model summary to avoid printing the correlation among estimated fixed effects.

math.lme <-  lme(MATH ~ ESCS + SEX + GRADE, random = ~ 1 | SCHOOLID, 
                 data = pisa, method = "ML")
math.lme2 <- lme(MATH  ~ CESCS + SEX + GRADE + ESCSM, random = ~ 1 | SCHOOLID, 
                 data = pisa, method = "ML")
mysummary <- function(object){
               assignInNamespace("print.correlation", 
                  function(x, title) return(), ns="nlme")  ###suppress correlation
              summary(object) 
              }
mysummary(math.lme2)
## Linear mixed-effects model fit by maximum likelihood
##   Data: pisa 
##     AIC   BIC logLik
##   14385 14421  -7185
## 
## Random effects:
##  Formula: ~1 | SCHOOLID
##         (Intercept) Residual
## StdDev:       35.17     67.2
## 
## Fixed effects:  MATH ~ CESCS + SEX + GRADE + ESCSM 
##             Value Std.Error   DF t-value p-value
## (Intercept) 474.0     8.316 1226   57.00  0.0000
## CESCS         3.7     2.409 1226    1.52  0.1281
## SEXMale      28.1     5.061 1226    5.55  0.0000
## GRADE        33.3     5.621 1226    5.93  0.0000
## ESCSM        56.9    11.694   38    4.87  0.0000
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -3.6511706 -0.6504630  0.0005975  0.6373420  3.3545257 
## 
## Number of Observations: 1269
## Number of Groups: 40

We compare the two models (both fitted by ML) by information criteria

anova(math.lme, math.lme2) 

We update the model since there is a substantial improvement.

math.lme <- math.lme2

Then we add student-level predictors, but there’s little improvement.

math.lme2 <- update(math.lme, .~. + KIND + FAMSTRUC + CHANGE + EARLY)
mysummary(math.lme2)
## Linear mixed-effects model fit by maximum likelihood
##   Data: pisa 
##     AIC   BIC logLik
##   14390 14462  -7181
## 
## Random effects:
##  Formula: ~1 | SCHOOLID
##         (Intercept) Residual
## StdDev:       35.66    66.93
## 
## Fixed effects:  MATH ~ CESCS + SEX + GRADE + ESCSM + KIND + FAMSTRUC + CHANGE +      EARLY 
##             Value Std.Error   DF t-value p-value
## (Intercept) 503.7     69.78 1219   7.219  0.0000
## CESCS         3.7      2.42 1219   1.512  0.1308
## SEXMale      28.2      5.08 1219   5.556  0.0000
## GRADE        33.9      5.74 1219   5.912  0.0000
## ESCSM        57.4     11.88   38   4.835  0.0000
## KIND          1.5     11.59 1219   0.126  0.8996
## FAMSTRUC      6.1      4.92 1219   1.234  0.2176
## CHANGEMiss  -17.9     76.36 1219  -0.234  0.8149
## CHANGEN/A    14.4     71.74 1219   0.201  0.8406
## CHANGENo    -27.5     68.80 1219  -0.399  0.6898
## CHANGEYes   -37.5     68.24 1219  -0.549  0.5830
## EARLY        -2.0     10.22 1219  -0.193  0.8470
## 
## Standardized Within-Group Residuals:
##       Min        Q1       Med        Q3       Max 
## -3.681082 -0.652393  0.002752  0.634641  3.460378 
## 
## Number of Observations: 1269
## Number of Groups: 40

We proceed to add a school-level variable, the type of school. Here there’s some improvement, since both AIC and BIC are smaller.

math.lme2 <- update(math.lme, . ~ . + LYCEUM + TECH)
mysummary(math.lme2) 
## Linear mixed-effects model fit by maximum likelihood
##   Data: pisa 
##     AIC   BIC logLik
##   14372 14418  -7177
## 
## Random effects:
##  Formula: ~1 | SCHOOLID
##         (Intercept) Residual
## StdDev:       27.46    67.19
## 
## Fixed effects:  MATH ~ CESCS + SEX + GRADE + ESCSM + LYCEUM + TECH 
##             Value Std.Error   DF t-value p-value
## (Intercept) 440.1    12.943 1226   34.00  0.0000
## CESCS         3.6     2.410 1226    1.51  0.1315
## SEXMale      28.6     4.981 1226    5.74  0.0000
## GRADE        32.5     5.624 1226    5.79  0.0000
## ESCSM        45.8    15.063   36    3.04  0.0044
## LYCEUM       34.4    20.283   36    1.70  0.0983
## TECH         53.7    11.910   36    4.51  0.0001
## 
## Standardized Within-Group Residuals:
##       Min        Q1       Med        Q3       Max 
## -3.673400 -0.642607  0.009199  0.633599  3.331784 
## 
## Number of Observations: 1269
## Number of Groups: 40
math.lme <- math.lme2

Furthermore, we add further context variables; only SCHLSIZE seems worth retaining.

math.lme2 <- update(math.lme, .~. + SCHLSIZE + ISTMED + ISTLAR + PCGMED + PCGALT)
options(digits = 5)
mysummary(math.lme2)
## Linear mixed-effects model fit by maximum likelihood
##   Data: pisa 
##     AIC   BIC  logLik
##   14373 14445 -7172.7
## 
## Random effects:
##  Formula: ~1 | SCHOOLID
##         (Intercept) Residual
## StdDev:      24.296   67.189
## 
## Fixed effects:  MATH ~ CESCS + SEX + GRADE + ESCSM + LYCEUM + TECH + SCHLSIZE +      ISTMED + ISTLAR + PCGMED + PCGALT 
##              Value Std.Error   DF t-value p-value
## (Intercept) 400.82   23.7894 1226 16.8485  0.0000
## CESCS         3.71    2.4170 1226  1.5337  0.1254
## SEXMale      27.53    5.2418 1226  5.2524  0.0000
## GRADE        32.81    5.6319 1226  5.8253  0.0000
## ESCSM        26.91   16.5168   31  1.6290  0.1134
## LYCEUM       52.39   20.2194   31  2.5912  0.0145
## TECH         57.24   11.3199   31  5.0564  0.0000
## SCHLSIZE      0.03    0.0142   31  2.4664  0.0194
## ISTMED        2.01   15.8283   31  0.1273  0.8995
## ISTLAR       14.26   16.8025   31  0.8486  0.4026
## PCGMED       10.74   14.0147   31  0.7661  0.4494
## PCGALT       -8.44   12.3960   31 -0.6808  0.5011
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -3.7243603 -0.6480212 -0.0014143  0.6281788  3.3454254 
## 
## Number of Observations: 1269
## Number of Groups: 40
math.lme2 <- update(math.lme, . ~ . + SCHLSIZE)
math.lme <- math.lme2

Finally, we add more school description variables, retaining one of them.

math.lme2 <- update(math.lme, .~. + SCMATEDU + SCMATBUI + MEDDISC, na.action = na.omit)
options(digits = 5)
mysummary(math.lme2)
## Linear mixed-effects model fit by maximum likelihood
##   Data: pisa 
##     AIC   BIC  logLik
##   14372 14439 -7172.9
## 
## Random effects:
##  Formula: ~1 | SCHOOLID
##         (Intercept) Residual
## StdDev:      24.361   67.192
## 
## Fixed effects:  MATH ~ CESCS + SEX + GRADE + ESCSM + LYCEUM + TECH + SCHLSIZE +      SCMATEDU + SCMATBUI + MEDDISC 
##              Value Std.Error   DF t-value p-value
## (Intercept) 426.96   16.3239 1226 26.1556  0.0000
## CESCS         3.60    2.4136 1226  1.4926  0.1358
## SEXMale      29.10    4.9751 1226  5.8497  0.0000
## GRADE        32.69    5.6289 1226  5.8080  0.0000
## ESCSM        32.54   15.2608   32  2.1322  0.0408
## LYCEUM       33.99   19.1488   32  1.7749  0.0854
## TECH         48.45   12.2748   32  3.9474  0.0004
## SCHLSIZE      0.02    0.0137   32  1.7833  0.0840
## SCMATEDU      0.04    5.5670   32  0.0067  0.9947
## SCMATBUI      0.24    6.7308   32  0.0364  0.9712
## MEDDISC      21.82   13.4689   32  1.6203  0.1150
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -3.7098825 -0.6491654 -0.0072556  0.6338338  3.3512952 
## 
## Number of Observations: 1269
## Number of Groups: 40
math.lme2 <- update(math.lme, .~. + MEDDISC, na.action = na.omit)

As a result, we retain the following model, which seems sensible. Note the reduced between-school variance compared to the initial models.

options(digits = 5)
mysummary(math.lme2)
## Linear mixed-effects model fit by maximum likelihood
##   Data: pisa 
##     AIC   BIC  logLik
##   14368 14424 -7172.9
## 
## Random effects:
##  Formula: ~1 | SCHOOLID
##         (Intercept) Residual
## StdDev:      24.362   67.192
## 
## Fixed effects:  MATH ~ CESCS + SEX + GRADE + ESCSM + LYCEUM + TECH + SCHLSIZE +      MEDDISC 
##              Value Std.Error   DF t-value p-value
## (Intercept) 426.88   16.0739 1226 26.5574  0.0000
## CESCS         3.60    2.4115 1226  1.4937  0.1355
## SEXMale      29.11    4.9457 1226  5.8861  0.0000
## GRADE        32.69    5.6241 1226  5.8133  0.0000
## ESCSM        32.63   14.5815   34  2.2380  0.0319
## LYCEUM       34.00   18.6211   34  1.8257  0.0767
## TECH         48.73   11.0658   34  4.4038  0.0001
## SCHLSIZE      0.02    0.0133   34  1.8281  0.0763
## MEDDISC      21.72   12.4499   34  1.7447  0.0901
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -3.7102699 -0.6492130 -0.0070226  0.6328997  3.3514873 
## 
## Number of Observations: 1269
## Number of Groups: 40

Crossed random effects

Fitting crossed random effects requires the lme4 package. Here we take a simple example from the Help files of the package, the Penicillin data.

library(lme4) 
data(Penicillin)
?Penicillin

The crossed structure is easily appreciated.

head(Penicillin)
xtabs( ~ sample + plate, Penicillin)
##       plate
## sample a b c d e f g h i j k l m n o p q r s t u v w x
##      A 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##      B 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##      C 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##      D 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##      E 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##      F 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

The fit is performed by the lmer function. Note the slightly different syntax compared to lme.

mod1 <- lmer(diameter ~ 1 + (1 | sample), data = Penicillin)
mod2 <- lmer(diameter ~ 1 + (1 | plate), data = Penicillin)
mod.vc <- lmer(diameter ~ 1 + (1 | sample) + (1 | plate), data = Penicillin)
summary(mod.vc)
## Linear mixed model fit by REML ['lmerMod']
## Formula: diameter ~ 1 + (1 | sample) + (1 | plate)
##    Data: Penicillin
## 
## REML criterion at convergence: 330.9
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.0792 -0.6714  0.0629  0.5838  2.9796 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  plate    (Intercept) 0.717    0.847   
##  sample   (Intercept) 3.731    1.932   
##  Residual             0.302    0.550   
## Number of obs: 144, groups:  plate, 24; sample, 6
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)   22.972      0.809    28.4

There are some differences between lme and lmer; the latter is somewhat preferable, but less general than the former.

GLMMs

R has indeed several tools for GLMMs. Some useful information are available at https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html.

Multicenter clinical trial

We start from the analysis of multicenter clinical trial already described in class. We start from a standard logistic regression, with only the dummy variable for treatment.

npos <- c(11, 16, 14, 2, 6, 1, 1, 4, 10, 22, 7, 1, 0, 0, 1, 6)
ntot <- c(36, 20, 19, 16, 17, 11, 5, 6, 37, 32, 19, 17, 12, 10, 9, 7)
treatment <- c(rep(1,8), rep(0,8))
clinic <-c(seq(8), seq(8))
booth <- data.frame(succ = npos, den = ntot, treat = treatment, cli = clinic)
booth$treat <- factor(booth$treat)
bh.glm0 <- glm(cbind(succ, den - succ) ~ treat, data = booth, family = binomial)
summary(bh.glm0)
## 
## Call:
## glm(formula = cbind(succ, den - succ) ~ treat, family = binomial, 
##     data = booth)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -3.093  -2.491  -0.913   1.594   4.145  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   -0.714      0.178   -4.01    6e-05 ***
## treat1         0.404      0.251    1.61     0.11    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 93.555  on 15  degrees of freedom
## Residual deviance: 90.960  on 14  degrees of freedom
## AIC: 133.4
## 
## Number of Fisher Scoring iterations: 4

Since the data are binomial (with denominators larger than one), some residual plots are meaningful

library(boot)
glm.diag.plots(bh.glm0)

Adding the clinic effect improves the fit.

bh.glm1 <- update(bh.glm0, . ~ . + factor(cli))
glm.diag.plots(bh.glm1)

This is also suggested by the AIC

AIC(bh.glm0, bh.glm1) 

Mixed effects logistic regression

There are several packages for GLMs with mixed effects. PQL estimates are provided by the MASS package.

bh.pql <- glmmPQL(cbind(succ, den - succ) ~ treat, 
                  random = ~ 1 | cli, 
                  data = booth, family = "binomial")
summary(bh.pql)
## Linear mixed-effects model fit by maximum likelihood
##   Data: booth 
##   AIC BIC logLik
##    NA  NA     NA
## 
## Random effects:
##  Formula: ~1 | cli
##         (Intercept) Residual
## StdDev:      1.3225  0.96698
## 
## Variance function:
##  Structure: fixed weights
##  Formula: ~invwt 
## Fixed effects:  cbind(succ, den - succ) ~ treat 
##                Value Std.Error DF t-value p-value
## (Intercept) -1.14011   0.55779  7 -2.0440  0.0802
## treat1       0.72083   0.30555  7  2.3591  0.0504
## 
## Standardized Within-Group Residuals:
##      Min       Q1      Med       Q3      Max 
## -1.50397 -0.70106 -0.23830  0.47329  1.27708 
## 
## Number of Observations: 16
## Number of Groups: 8
intervals(bh.pql)
## Approximate 95% confidence intervals
## 
##  Fixed effects:
##                 lower     est.    upper
## (Intercept) -2.373883 -1.14011 0.093662
## treat1       0.044975  0.72083 1.396689
## 
##  Random Effects:
##   Level: cli 
##                   lower   est.  upper
## sd((Intercept)) 0.73693 1.3225 2.3734
## 
##  Within-group standard error:
##   lower    est.   upper 
## 0.58319 0.96698 1.60331

Note that, since the fitting function is a wrapper from lme, the summary reports the Within-group standard deviation (error), which is a meaningless quantity for logistic regression.

If we compare the fixed-effect estimate with the random-effect ones, the shrinkage effect is apparent. Note that we need to re-express the linear predictor for the former approach.

bh.glm1 <- glm(cbind(succ, den - succ) ~ factor(cli)- 1 + treat, data = booth,
              family = binomial)
tabe <- cbind(coef(bh.pql)[, 1], bh.glm1$coef[-9])
colnames(tabe) <- c("Random", "Fixed")
knitr::kable(tabe)
Random Fixed
factor(cli)1 -1.28457 -1.32201
factor(cli)2 0.65624 0.73342
factor(cli)3 -0.19744 -0.16911
factor(cli)4 -2.46611 -2.74047
factor(cli)5 -1.73615 -1.84191
factor(cli)6 -2.75816 -3.46893
factor(cli)7 -1.87599 -2.11972
factor(cli)8 0.54129 0.88590

MLE based on adaptive quadrature is implemented by the glmmML package.

library(glmmML)
bh.ML1 <- glmmML(cbind(succ, den - succ) ~ treat, 
                 data = booth, 
                 family = binomial(logit),
                 cluster = booth$cli, method = "Laplace")
bh.ML10 <- glmmML(cbind(succ, den - succ) ~ treat, 
                  data = booth, 
                  family = binomial(logit),
                  cluster = booth$cli, method = "ghq", n.points = 10) 
bh.ML1
## 
## Call:  glmmML(formula = cbind(succ, den - succ) ~ treat, family = binomial(logit),      data = booth, cluster = booth$cli, method = "Laplace") 
## 
## 
##               coef se(coef)     z Pr(>|z|)
## (Intercept) -1.196    0.552 -2.17    0.030
## treat1       0.738    0.300  2.46    0.014
## 
## Scale parameter in mixing distribution:  1.39 gaussian 
## Std. Error:                              0.382 
## 
##         LR p-value for H_0: sigma = 0:  5.59e-14 
## 
## Residual deviance: 35.8 on 13 degrees of freedom     AIC: 41.8
bh.ML10
## 
## Call:  glmmML(formula = cbind(succ, den - succ) ~ treat, family = binomial(logit),      data = booth, cluster = booth$cli, method = "ghq", n.points = 10) 
## 
## 
##               coef se(coef)     z Pr(>|z|)
## (Intercept) -1.198    0.556 -2.15    0.031
## treat1       0.738    0.300  2.46    0.014
## 
## Scale parameter in mixing distribution:  1.4 gaussian 
## Std. Error:                              0.426 
## 
##         LR p-value for H_0: sigma = 0:  5.14e-14 
## 
## Residual deviance: 35.6 on 13 degrees of freedom     AIC: 41.6

The glmmML allows for different distributions for the random intercepts.

bh.cauchy.ML <- glmmML(cbind(succ, den - succ) ~ treat, 
                       data = booth, 
                       family = binomial(logit),
                       cluster = booth$cli, method = "Laplace", prior = "cauchy")
bh.cauchy.ML 
## 
## Call:  glmmML(formula = cbind(succ, den - succ) ~ treat, family = binomial(logit),      data = booth, cluster = booth$cli, prior = "cauchy", method = "Laplace") 
## 
## 
##               coef se(coef)     z Pr(>|z|)
## (Intercept) -1.367     0.63 -2.17    0.030
## treat1       0.739     0.30  2.46    0.014
## 
## Scale parameter in mixing distribution:  0.928 cauchy 
## Std. Error:                              0.361 
## 
##         LR p-value for H_0: sigma = 0:  4.59e-13 
## 
## Residual deviance: 39.9 on 13 degrees of freedom     AIC: 45.9

The lme4 package offers several useful methods, implementing adaptive Gaussian quadrature for random effects. The main fitting function is glmer.

library(lme4)
bh.ML4.10 <- glmer(cbind(succ, den - succ) ~ treat + (1 | cli), 
                   data = booth,
                   family = binomial(logit), nAGQ = 10)
bh.ML4.10
## Generalized linear mixed model fit by maximum likelihood (Adaptive
##   Gauss-Hermite Quadrature, nAGQ = 10) [glmerMod]
##  Family: binomial  ( logit )
## Formula: cbind(succ, den - succ) ~ treat + (1 | cli)
##    Data: booth
##      AIC      BIC   logLik deviance df.resid 
##   41.648   43.966  -17.824   35.648       13 
## Random effects:
##  Groups Name        Std.Dev.
##  cli    (Intercept) 1.4     
## Number of obs: 16, groups:  cli, 8
## Fixed Effects:
## (Intercept)       treat1  
##      -1.198        0.738

Laplace-based estimation is available also in the awesome glmmTMB package.

library(glmmTMB)
booth$cli <- factor(booth$cli)
bh.TMB <- glmmTMB(succ / den ~ I(treat==1) + (1 | cli), 
                  data = booth, family = "binomial",
                  weights = den)
summary(bh.TMB)
##  Family: binomial  ( logit )
## Formula:          succ/den ~ I(treat == 1) + (1 | cli)
## Data: booth
## Weights: den
## 
##      AIC      BIC   logLik deviance df.resid 
##     80.2     82.5    -37.1     74.2       13 
## 
## Random effects:
## 
## Conditional model:
##  Groups Name        Variance Std.Dev.
##  cli    (Intercept) 1.93     1.39    
## Number of obs: 16, groups:  cli, 8
## 
## Conditional model:
##                   Estimate Std. Error z value Pr(>|z|)  
## (Intercept)         -1.196      0.553   -2.16    0.030 *
## I(treat == 1)TRUE    0.738      0.300    2.46    0.014 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

A count data example

For a count data setting, we consider the famous Epilepsy data (https://journals.sagepub.com/doi/abs/10.1191/1471082X03st058oa).

data(epil2, package = "glmmTMB")
help(epil2)
epil2$subject <- factor(epil2$subject)

We start, as usual, by a suitable visualization. Note that subject 1-28 are treated with a a placebo.

xyplot(y ~ period |subject,  
       data = epil2, 
       panel = function(x, y){ 
         panel.xyplot(x, y) } )

We fit a Poisson mixed model with random intercepts and slopes both via PQL and MLE based on adaptive quadrature. The substantial bias of PQL is apparent.

mod.pois.pql <- glmmPQL(y ~ Base * trt + Age + Visit, 
                        random = ~ Visit | subject,
                        data = epil2, family = "poisson")
mod.pois.glmer <- glmer(y ~ Base * trt + Age + Visit + (Visit | subject),
                        data = epil2, family = "poisson") 
mod.pois.pql
## Linear mixed-effects model fit by maximum likelihood
##   Data: epil2 
##   Log-likelihood: NA
##   Fixed: y ~ Base * trt + Age + Visit 
##       (Intercept)              Base      trtprogabide               Age 
##           -1.5161            0.8778           -0.9239            0.5394 
##             Visit Base:trtprogabide 
##           -0.2869            0.3475 
## 
## Random effects:
##  Formula: ~Visit | subject
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev Corr  
## (Intercept) 0.4486 (Intr)
## Visit       0.4749 0.094 
## Residual    1.3573       
## 
## Variance function:
##  Structure: fixed weights
##  Formula: ~invwt 
## Number of Observations: 236
## Number of Groups: 59
mod.pois.glmer
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: poisson  ( log )
## Formula: y ~ Base * trt + Age + Visit + (Visit | subject)
##    Data: epil2
##      AIC      BIC   logLik deviance df.resid 
##   1328.8   1360.0   -655.4   1310.8      227 
## Random effects:
##  Groups  Name        Std.Dev. Corr
##  subject (Intercept) 0.499        
##          Visit       0.736    0.01
## Number of obs: 236, groups:  subject, 59
## Fixed Effects:
##       (Intercept)               Base       trtprogabide                Age  
##            -1.364              0.883             -0.931              0.476  
##             Visit  Base:trtprogabide  
##            -0.269              0.340  
## optimizer (Nelder_Mead) convergence code: 0 (OK) ; 0 optimizer warnings; 1 lme4 warnings

The Poisson model is well estimated by glmmTMB, employing the Laplace approximation. The package implements also Negative binomial mixed model, which in this case seems to provide a better fit.

mod.pois <- glmmTMB(y ~ Base * trt + Age + Visit + (Visit | subject), 
                    data = epil2, family = "poisson")   
summary(mod.pois)    
##  Family: poisson  ( log )
## Formula:          y ~ Base * trt + Age + Visit + (Visit | subject)
## Data: epil2
## 
##      AIC      BIC   logLik deviance df.resid 
##   1328.8   1360.0   -655.4   1310.8      227 
## 
## Random effects:
## 
## Conditional model:
##  Groups  Name        Variance Std.Dev. Corr 
##  subject (Intercept) 0.249    0.499         
##          Visit       0.542    0.736    0.01 
## Number of obs: 236, groups:  subject, 59
## 
## Conditional model:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)         -1.356      1.197   -1.13    0.257    
## Base                 0.884      0.131    6.76  1.4e-11 ***
## trtprogabide        -0.929      0.401   -2.32    0.020 *  
## Age                  0.473      0.353    1.34    0.180    
## Visit               -0.269      0.165   -1.63    0.104    
## Base:trtprogabide    0.339      0.204    1.66    0.096 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
mod.nbin1 <- glmmTMB(y ~ Base * trt + Age + Visit + (Visit | subject),
                     data = epil2, family = "nbinom1")  
anova(mod.pois, mod.nbin1) 
summary(mod.nbin1) 
##  Family: nbinom1  ( log )
## Formula:          y ~ Base * trt + Age + Visit + (Visit | subject)
## Data: epil2
## 
##      AIC      BIC   logLik deviance df.resid 
##   1287.3   1321.9   -633.6   1267.3      226 
## 
## Random effects:
## 
## Conditional model:
##  Groups  Name        Variance Std.Dev. Corr 
##  subject (Intercept) 0.191    0.437         
##          Visit       0.163    0.404    0.08 
## Number of obs: 236, groups:  subject, 59
## 
## Dispersion parameter for nbinom1 family (): 1.09 
## 
## Conditional model:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)         -1.538      1.168   -1.32    0.188    
## Base                 0.852      0.126    6.75  1.5e-11 ***
## trtprogabide        -0.941      0.401   -2.35    0.019 *  
## Age                  0.557      0.342    1.63    0.104    
## Visit               -0.276      0.180   -1.53    0.125    
## Base:trtprogabide    0.359      0.199    1.81    0.071 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Notice that allowing for overdispersion (conditional on the random effects) modifies the estimates.

A three-level example

We consider a often-studied dataset first introduced in https://rss.onlinelibrary.wiley.com/doi/10.2307/2983404. The data are also available in the mlmRev package.

library(mlmRev)
data(guImmun)
head(guImmun)
summary(guImmun)
##       kid            mom            comm      immun    kid2p    mom25p  
##  2      :   1   310    :   3   185    :  55   N:1195   N: 493   N:1038  
##  269    :   1   384    :   3   210    :  50   Y: 964   Y:1666   Y:1121  
##  272    :   1   456    :   3   226    :  34                             
##  273    :   1   464    :   3   227    :  32                             
##  274    :   1   498    :   3   174    :  30                             
##  275    :   1   514    :   3   188    :  30                             
##  (Other):2153   (Other):2141   (Other):1928                             
##  ord      ethn     momEd    husEd    momWork  rural       pcInd81      
##  01:380   L:1283   N:1050   N: 676   N:1187   N: 519   Min.   :0.0067  
##  23:740   N: 374   P: 963   P:1056   Y: 972   Y:1640   1st Qu.:0.0806  
##  46:721   S: 502   S: 146   S: 245                     Median :0.5069  
##  7p:318                     U: 182                     Mean   :0.4666  
##                                                        3rd Qu.:0.8349  
##                                                        Max.   :0.9959  
## 

The logistic regression model, ignoring the hierarchical data structure, is readily fitted.

guI.glm <- glm(immun ~ kid2p + mom25p + ord  +
                ethn + momEd + husEd +
                momWork + rural + pcInd81, 
                family = binomial,  data = guImmun) 

There are at least three different random intercepts models:

  1. Random effects for community only.
guI.pql.comm <- glmmPQL(immun ~ kid2p + mom25p + ord  + 
                        ethn + momEd + husEd +
                        momWork + rural + pcInd81,
                        family = binomial, data = guImmun, random = ~ 1 | comm)
  1. Random effects for family only.
guI.pql.mom <- glmmPQL(immun ~ kid2p + mom25p + ord  +
                       ethn + momEd + husEd +
                       momWork + rural + pcInd81,
                       family = binomial, data = guImmun, random = ~ 1 | mom)
  1. Three-level model, with family nested in community. Note that glmer could in principle fit these models, but it fails to converge in this example.
guI.pql3 <- glmmPQL(immun ~ kid2p + mom25p + ord  + ethn + momEd + husEd +
                    momWork + rural + pcInd81,
                    family = binomial, data = guImmun, random = ~ 1 | comm / mom)

glmmTMB is instead successful

guI.TMB <- glmmTMB(immun ~ kid2p + mom25p + ord  + ethn + momEd + husEd +
                   momWork + rural + pcInd81 + (1 | comm / mom),
                   data = guImmun, family = binomial(logit))

Now we compare all the various estimates, considering the ratios between point estimates of fixed effects and their estimated standard errors. The extent of diversity across the various methods is noteworthy.

mat.coef <- cbind(coef(guI.glm), guI.pql.comm$coef$fixed, guI.pql.mom$coef$fixed, 
                 guI.pql3$coef$fixed, fixef(guI.TMB)$cond)
se.pql3 <- diag(guI.pql3$varFix)^.5   
se.pql.comm <- diag(guI.pql.comm$varFix)^.5  
se.pql.mom <- diag(guI.pql.mom$varFix)^.5  
se.glm <- diag(vcov(guI.glm))^.5
se.lap <- sqrt(diag(vcov(guI.TMB)$cond))
mat.se <- cbind(se.glm, se.pql.comm, se.pql.mom, se.pql3, se.lap)
mat <- mat.coef / mat.se
options(digits = 2)
colnames(mat) <- c("glm", "PQL-comm", "PQL-family", "PQL-3 levels", "MLE-3 levels")
knitr::kable(mat)
glm PQL-comm PQL-family PQL-3 levels MLE-3 levels
(Intercept) -3.31 -2.76 -2.91 -2.53 -2.79
kid2pY 8.31 8.48 11.25 11.50 8.00
mom25pY -0.64 -0.66 -1.35 -1.42 -0.78
ord23 -0.62 -0.59 -2.28 -2.39 -0.80
ord46 0.58 0.92 -0.30 -0.12 0.81
ord7p 0.79 1.00 0.78 1.02 1.08
ethnN 1.40 -0.52 1.15 -0.37 -0.33
ethnS 1.34 -0.19 0.90 -0.26 -0.14
momEdP 2.36 1.75 2.45 2.00 1.94
momEdS 1.26 0.80 1.13 0.82 0.90
husEdP 2.63 2.53 2.31 2.33 2.48
husEdS 1.06 1.31 0.93 1.14 1.29
husEdU 0.18 0.15 -0.10 -0.12 0.06
momWorkY 2.60 1.82 2.47 2.02 1.94
ruralY -4.35 -3.05 -3.96 -3.02 -3.09
pcInd81 -3.77 -2.44 -3.31 -2.30 -2.48

A look at the estimated variance components is also recommendable, as well as a plot of the estimated random effects.

VarCorr(guI.pql.comm)
## comm = pdLogChol(1) 
##             Variance StdDev
## (Intercept) 0.39     0.63  
## Residual    0.94     0.97
VarCorr(guI.pql.mom)
## mom = pdLogChol(1) 
##             Variance StdDev
## (Intercept) 5.03     2.24  
## Residual    0.45     0.67
VarCorr(guI.pql3)
##             Variance     StdDev
## comm =      pdLogChol(1)       
## (Intercept) 0.66         0.81  
## mom =       pdLogChol(1)       
## (Intercept) 4.89         2.21  
## Residual    0.43         0.65
VarCorr(guI.TMB)
## 
## Conditional model:
##  Groups   Name        Std.Dev.
##  mom:comm (Intercept) 1.135   
##  comm     (Intercept) 0.721
qqnorm(guI.pql3, ~ ranef(., level = 2)) #family
qqnorm(guI.pql3, ~ ranef(., level = 1)) #community  

This is a rather challenging example, and indeed the Laplace-based MLE does not provide a good estimate; see https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/1467-985X.00206.

We can perform a more careful comparison by limiting the attention to the random intercepts model with only the family effect, which is the most sizeable.

guI.ML.mom <- glmmML(immun ~ kid2p + mom25p + ord  +
                     ethn + momEd + husEd +
                     momWork + rural + pcInd81,
                     cluster = guImmun$mom,
                     method = "ghq", n.points = 10, 
                     data = guImmun, family = binomial(logit))
guI.ML.mom2 <- glmer(immun ~ kid2p + mom25p + ord  + ethn + momEd + husEd +
                     momWork + rural + pcInd81 + (1 | mom), nAGQ = 10,
                     data = guImmun, family = binomial(logit))

The results are indeed different, but the warning for glmer (not shown) points to numerical issues.

mat2 <- cbind(mat[, 3], guI.ML.mom$coeffi /  guI.ML.mom$coef.sd, 
             fixef(guI.ML.mom2) / sqrt(diag(vcov(guI.ML.mom2)))) 
colnames(mat2)<- c("PQL-family", "MLE-family", "MLE2-family")
knitr::kable(mat2)
PQL-family MLE-family MLE2-family
(Intercept) -2.91 -2.92 -2.23
kid2pY 11.25 7.91 7.87
mom25pY -1.35 -0.93 -0.99
ord23 -2.28 -1.14 -1.18
ord46 -0.30 0.35 0.07
ord7p 0.78 0.95 0.65
ethnN 1.15 1.18 1.27
ethnS 0.90 0.95 1.28
momEdP 2.45 2.39 1.89
momEdS 1.13 1.18 0.95
husEdP 2.31 2.30 1.99
husEdS 0.93 0.99 0.67
husEdU -0.10 -0.02 -0.19
momWorkY 2.47 2.40 2.36
ruralY -3.96 -3.76 -3.87
pcInd81 -3.31 -3.25 -3.78

The point is that the estimated family variance is huge (larger than 4 for the MLE implementations): this is indeed a very difficult example, requiring a lot of care!